{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Copyright (c) Cornac Authors. All rights reserved.*\n",
    "\n",
    "*Licensed under the Apache 2.0 License.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text to Graph Transformation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/text_to_graph.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://github.com/PreferredAI/cornac/blob/master/tutorials/text_to_graph.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume that we are working with some recommender model leveraging relationships among items, i.e., item network. However, for some dataset of interest, the latter modality is not available. Instead, we have access to another type of auxiliary information, let's say item textual descriptions. Rather than ignoring the available auxiliary information or changing to another algorithm integrating item texts, which could be costly, Cornac enables us to exploit the available textual information while using our model. This is possible thanks to Cornac's cross-modality transformation feature, which we will rely on in this tutorial.\n",
    "\n",
    "Instructions on how to install Cornac are available [here](https://github.com/PreferredAI/cornac#installation). From now on we assume that installation is completed. \n",
    "\n",
    "## Data loading\n",
    "\n",
    "Consider the Amazon Clothing dataset consisting of user-item ratings and item content information (e.g., text, visual features, context). For the purpose of this tutorial, assume that the item textual descriptions is our only source of auxiliary information. This dataset is accessible through Cornac, and we can simply load it as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import cornac\n",
    "from cornac.data import Reader\n",
    "from cornac.datasets import amazon_clothing\n",
    "\n",
    "item_texts, item_ids = amazon_clothing.load_text()\n",
    "ratings = amazon_clothing.load_feedback()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## From Text to Graph\n",
    "\n",
    "Recall that we are interested in a recommender model leveraging item network. We therefore seek to transform our text auxiliary information into a graph one, i.e., build an item graph encoding textual similarities between them. As we shall see below, Cornac makes this exercise convenient and straightforward thanks to its *Modality* support.\n",
    "\n",
    "We first need to build a vector-based representation of our raw texts, i.e., item-word matrix. Importantly, we don't have to worry about how to generate such representation, the Cornac's `TextModality` class implements the necessary routines for this purpose:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from cornac.data import TextModality\n",
    "from cornac.data.text import BaseTokenizer\n",
    "\n",
    "# build text modality\n",
    "item_text_modality = TextModality(corpus=item_texts, ids=item_ids,\n",
    "                                tokenizer=BaseTokenizer(sep=' ', stop_words='english'),\n",
    "                                max_vocab=5000, max_doc_freq=0.5)\n",
    "item_text_modality.build()\n",
    "\n",
    "# get the item-word count matrix \n",
    "item_word_mat = item_text_modality.count_matrix.A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, all we need to do is to instantiate a `GraphModality` from our text feature matrix as in the code below, which under the hood will construct a k-nearest neighbor graph of items, encoding textual similarities among them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from cornac.data import GraphModality\n",
    "\n",
    "item_graph_modality = GraphModality.from_feature(features=item_word_mat,ids=item_ids,\n",
    "                                                 k=5, symmetric=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more details on instantiating a `GraphModality` from features, one can refer to Cornac's [documentation](https://cornac.readthedocs.io/en/latest/data.html#module-cornac.data.graph). At this level we can proceed as if we were given some item network and fit any recommender model leveraging such auxiliary information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fiting and evaluating our recommender model\n",
    "\n",
    "As a recommendation algorithm integrating item network we use [Matrix Co-Factorization (MCF)](http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/proceedings/p1113.pdf), other choices are possible, e.g., [C2PF](https://www.ijcai.org/proceedings/2018/0370.pdf). We further include [Probabilistic Matrix Factorization (PMF)](https://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf) as a baseline to assess the impact of the auxiliary information. Recall that, without the auxiliary information MCF reduces to PMF. To measure performance, we retain three metrics, namely Precision@50, Recall@50 and RMSE. All models and metrics are already implemented in Cornac, making such an experiment straightforward.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from cornac.eval_methods import RatioSplit\n",
    "\n",
    "# train/test data spliting\n",
    "ratio_split = RatioSplit(data=ratings, test_size=0.2, exclude_unknowns=True,\n",
    "                         item_graph=item_graph_modality, verbose=True, seed=123)\n",
    "\n",
    "# instantiate the recommender models\n",
    "pmf = cornac.models.PMF(k=40, max_iter=100, learning_rate=0.001, verbose=True, seed=123)\n",
    "mcf = cornac.models.MCF(k=40, max_iter=100, learning_rate=0.001, verbose=True, seed=123)\n",
    "\n",
    "# instantiate evaluation metrics\n",
    "rmse = cornac.metrics.RMSE()\n",
    "pre = cornac.metrics.Precision(k=50)\n",
    "rec = cornac.metrics.Recall(k=50)\n",
    "\n",
    "# instantiate and run your experiment\n",
    "exp = cornac.Experiment(eval_method=ratio_split,\n",
    "                        models=[pmf,mcf],\n",
    "                        metrics=[rmse, pre, rec],\n",
    "                        user_based=True)\n",
    "exp.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Output:**\n",
    "<pre>\n",
    "    |   RMSE | Precision@50 | Recall@50 | Train (s) | Test (s)\n",
    "--- + ------ + ------------ + --------- + --------- + --------\n",
    "PMF | 1.5082 |       0.0024 |    0.1002 |    5.6788 |   1.2293\n",
    "MCF | 1.2823 |       0.0043 |    0.1846 |   10.1189 |   1.1811\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clearly, MCF offers a substantial improvement over PMF. This highlights the importance of cross-modality transformations, which allowed us to take advantage of the available auxiliary data, namely item textual descriptions, while using the MCF model integrating item network into personalized recommendation.   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further discussion\n",
    "\n",
    "*From image to graph*. If we were given some visual auxiliary data (e.g., item images), then we can take the same approach and instantiate our `GraphModality` using visual features. The only difference is that we would rely on the `ImageModality` to prepare our visual features. \n",
    "\n",
    "*Cross-modality comparisons.* Note that beyond addressing the lack of a specific type of auxiliary data, another benefit of the kind of transformations addressed in this tutorial, is to enable quick and convenient cross-modality comparisons. In our example, if we have access to both visual and textual item content, we can quickly investigate which auxiliary information helps most, given MCF as our recommender model."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
